In [18]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors

1. Daugiausia ir geriausios kavos užauginama ir pagaminama Pietų Amerikos šalyse¶

1.1. Nustatyti kurioje šalyje yra daugiausia kavos gamintojų¶

1.2. Nustatyti, kurios šalies kavos pupelės gavo aukščiausius balus¶

1.3. Nustatyti, kurios šalies pupeles gamintojai renkasi daugiausia¶

2. Duomenų įkėlimas¶

In [13]:
coffee = pd.read_csv("df_arabica_clean_.csv", index_col=0)
coffee.head(3)
Out[13]:
ID Country of Origin Farm Name Lot Number Mill ICO Number Company Altitude Region Producer ... Defects Total Cup Points Moisture Percentage Category One Defects Quakers Color Category Two Defects Certification Body Certification Address Certification Contact
0 0 Colombia Finca El Paraiso CQU2022015 Finca El Paraiso NaN Coffee Quality Union 1930 Piendamo,Cauca Diego Samuel Bermudez ... 0 89.33 11.8 0 0 green 3 Japan Coffee Exchange 〒413-0002 静岡県熱海市伊豆山1173−58 1173-58 Izusan, Ata... 松澤 宏樹 Koju Matsuzawa - +81(0)9085642901
1 1 Taiwan Royal Bean Geisha Estate The 2022 Pacific Rim Coffee Summit,T037 Royal Bean Geisha Estate NaN Taiwan Coffee Laboratory 1200 Chiayi 曾福森 ... 0 87.58 10.5 0 0 blue-green 0 Taiwan Coffee Laboratory 台灣咖啡研究室 QAHWAH CO., LTD 4F, No. 225, Sec. 3, Beixin Rd... Lin, Jen-An Neil 林仁安 - 886-289116612
2 2 Laos OKLAO coffee farms The 2022 Pacific Rim Coffee Summit,LA01 oklao coffee processing plant NaN Taiwan Coffee Laboratory 1300 Laos Borofen Plateau WU TAO CHI ... 0 87.42 10.4 0 0 yellowish 2 Taiwan Coffee Laboratory 台灣咖啡研究室 QAHWAH CO., LTD 4F, No. 225, Sec. 3, Beixin Rd... Lin, Jen-An Neil 林仁安 - 886-289116612

3 rows × 37 columns

In [325]:
coffee.shape
Out[325]:
(207, 37)
In [326]:
coffee.info()
<class 'pandas.core.frame.DataFrame'>
Index: 207 entries, 0 to 206
Data columns (total 37 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   ID                     207 non-null    int64  
 1   Country of Origin      207 non-null    object 
 2   Farm Name              205 non-null    object 
 3   Lot Number             206 non-null    object 
 4   Mill                   204 non-null    object 
 5   ICO Number             75 non-null     object 
 6   Company                207 non-null    object 
 7   Altitude               207 non-null    int64  
 8   Region                 205 non-null    object 
 9   Producer               206 non-null    object 
 10  Number of Bags         207 non-null    int64  
 11  Bag Weight             207 non-null    object 
 12  In-Country Partner     207 non-null    object 
 13  Owner                  207 non-null    object 
 14  Variety                201 non-null    object 
 15  Status                 207 non-null    object 
 16  Processing Method      202 non-null    object 
 17  Aroma                  207 non-null    float64
 18  Flavor                 207 non-null    float64
 19  Aftertaste             207 non-null    float64
 20  Acidity                207 non-null    float64
 21  Body                   207 non-null    float64
 22  Balance                207 non-null    float64
 23  Uniformity             207 non-null    float64
 24  Clean Cup              207 non-null    int64  
 25  Sweetness              207 non-null    int64  
 26  Overall                207 non-null    float64
 27  Defects                207 non-null    int64  
 28  Total Cup Points       207 non-null    float64
 29  Moisture Percentage    207 non-null    float64
 30  Category One Defects   207 non-null    int64  
 31  Quakers                207 non-null    int64  
 32  Color                  207 non-null    object 
 33  Category Two Defects   207 non-null    int64  
 34  Certification Body     207 non-null    object 
 35  Certification Address  207 non-null    object 
 36  Certification Contact  207 non-null    object 
dtypes: float64(10), int64(9), object(18)
memory usage: 61.5+ KB
In [7]:
for column in coffee.columns:
    distinct_values = len(coffee[column].unique())
    print(f"{column}: {distinct_values} distinct values")
ID: 207 distinct values
Country of Origin: 22 distinct values
Farm Name: 173 distinct values
Lot Number: 187 distinct values
Mill: 163 distinct values
ICO Number: 68 distinct values
Company: 72 distinct values
Altitude: 67 distinct values
Region: 121 distinct values
Producer: 173 distinct values
Number of Bags: 55 distinct values
Bag Weight: 39 distinct values
In-Country Partner: 21 distinct values
Owner: 80 distinct values
Variety: 49 distinct values
Status: 1 distinct values
Processing Method: 11 distinct values
Aroma: 19 distinct values
Flavor: 19 distinct values
Aftertaste: 20 distinct values
Acidity: 19 distinct values
Body: 17 distinct values
Balance: 18 distinct values
Uniformity: 3 distinct values
Clean Cup: 1 distinct values
Sweetness: 1 distinct values
Overall: 21 distinct values
Defects: 1 distinct values
Total Cup Points: 81 distinct values
Moisture Percentage: 46 distinct values
Category One Defects: 6 distinct values
Quakers: 11 distinct values
Color: 12 distinct values
Category Two Defects: 14 distinct values
Certification Body: 21 distinct values
Certification Address: 21 distinct values
Certification Contact: 21 distinct values
In [64]:
coffee.drop(['ICO Number', 'Lot Number','In-Country Partner', 'Quakers', 
             'Certification Address', 'Certification Contact', 'Owner', 'Status', 'Overall', 'Certification Body', 'Certification Address', 'Certification Contact'], axis=1).head(3)
Out[64]:
ID Country of Origin Farm Name Mill Company Altitude Region Producer Number of Bags Bag Weight ... Balance Uniformity Clean Cup Sweetness Defects Total Cup Points Moisture Percentage Category One Defects Color Category Two Defects
0 0 Colombia Finca El Paraiso Finca El Paraiso Coffee Quality Union 1930 Piendamo,Cauca Diego Samuel Bermudez 1 35 ... 8.42 10.0 10 10 0 89.33 11.8 0 green 3
1 1 Taiwan Royal Bean Geisha Estate Royal Bean Geisha Estate Taiwan Coffee Laboratory 1200 Chiayi 曾福森 1 80 ... 8.25 10.0 10 10 0 87.58 10.5 0 blue-green 0
2 2 Laos OKLAO coffee farms oklao coffee processing plant Taiwan Coffee Laboratory 1300 Laos Borofen Plateau WU TAO CHI 19 25 ... 8.17 10.0 10 10 0 87.42 10.4 0 yellowish 2

3 rows × 27 columns

3. Analizė¶

In [17]:
coffee.groupby('Country of Origin').agg({'Aroma': 'mean',
                                'Aftertaste': 'mean',
                                'Acidity': 'mean',
                                'Body': 'mean',
                                'Balance': 'mean',
                                'Sweetness': 'mean',
                                'Clean Cup': 'mean',
                                'Uniformity': 'mean',
                                }).sort_values(by='Aroma')
Out[17]:
Aroma Aftertaste Acidity Body Balance Sweetness Clean Cup Uniformity
Country of Origin
El Salvador 7.332857 7.261429 7.525714 7.475714 7.334286 10.0 10.0 10.000000
Thailand 7.466667 7.526667 7.521667 7.603333 7.598333 10.0 10.0 10.000000
Brazil 7.475000 7.359000 7.416000 7.426000 7.342000 10.0 10.0 10.000000
Nicaragua 7.475714 7.240000 7.368571 7.500000 7.512857 10.0 10.0 10.000000
Costa Rica 7.592500 7.532500 7.730000 7.708750 7.633750 10.0 10.0 10.000000
Vietnam 7.645000 7.477500 7.522500 7.542500 7.605000 10.0 10.0 10.000000
Mexico 7.667500 7.437500 7.522500 7.645000 7.457500 10.0 10.0 10.000000
Peru 7.667500 7.415000 7.520000 7.352500 7.415000 10.0 10.0 10.000000
Guatemala 7.697619 7.686667 7.758095 7.766190 7.726667 10.0 10.0 10.000000
Honduras 7.706154 7.536923 7.607692 7.607692 7.570769 10.0 10.0 10.000000
Indonesia 7.723333 7.610000 7.776667 7.640000 7.610000 10.0 10.0 10.000000
United States (Hawaii) 7.734000 7.632000 7.666000 7.516000 7.700000 10.0 10.0 10.000000
Laos 7.750000 7.496667 7.613333 7.526667 7.583333 10.0 10.0 10.000000
Colombia 7.780526 7.618421 7.736842 7.653684 7.650526 10.0 10.0 10.000000
Panama 7.790000 7.460000 7.795000 7.625000 7.585000 10.0 10.0 10.000000
Myanmar 7.830000 7.670000 7.580000 7.580000 7.580000 10.0 10.0 10.000000
Uganda 7.833333 7.610000 7.583333 7.666667 7.666667 10.0 10.0 10.000000
Kenya 7.835000 7.625000 7.540000 7.585000 7.750000 10.0 10.0 10.000000
Taiwan 7.842787 7.712131 7.765246 7.661311 7.730984 10.0 10.0 9.967213
Ethiopia 7.864545 7.757273 8.016364 7.819091 7.774545 10.0 10.0 10.000000
Tanzania, United Republic Of 7.903333 7.750000 7.863333 7.848333 7.776667 10.0 10.0 10.000000
Madagascar 7.920000 7.580000 7.750000 7.580000 7.670000 10.0 10.0 10.000000
In [78]:
coffee.groupby('Country of Origin')['Country of Origin'].count().sort_values(ascending=False)
Out[78]:
Country of Origin
Taiwan                          61
Guatemala                       21
Colombia                        19
Honduras                        13
Thailand                        12
Ethiopia                        11
Brazil                          10
Costa Rica                       8
Nicaragua                        7
El Salvador                      7
Tanzania, United Republic Of     6
United States (Hawaii)           5
Peru                             4
Mexico                           4
Vietnam                          4
Laos                             3
Uganda                           3
Indonesia                        3
Panama                           2
Kenya                            2
Myanmar                          1
Madagascar                       1
Name: Country of Origin, dtype: int64
In [237]:
coffee.groupby(['Country of Origin', 'Company'])['Flavor'].mean().sort_values(ascending=False)
Out[237]:
Country of Origin             Company                           
Laos                          Taiwan Coffee Laboratory              8.420
Ethiopia                      Marubeni corp.,                       8.250
                              Kerchanshe Trading PLC                8.170
Taiwan                        Dongbi coffee                         8.170
Tanzania, United Republic Of  DORMAN (T) LIMITED                    8.085
                                                                    ...  
Laos                          Marubeni Corporation                  7.170
Nicaragua                     SAJONIA ESTATE COFFEE S.A.            7.170
Brazil                        marubeni                              7.170
El Salvador                   Aprentium Enterprises LLC             7.110
Brazil                        Descafeinadores Mexicano SA. de CV    7.080
Name: Flavor, Length: 98, dtype: float64
In [3]:
plt.style.use("fivethirtyeight")
plt.figure(figsize=(20,7))
plt.xticks(rotation=90)

df1 = coffee.sort_values(by='Total Cup Points')
plt.bar(x=df1['Country of Origin'],
        height=df1['Total Cup Points'].sort_values(ascending=False), width=0.5
       )
plt.show()
In [388]:
df2 = coffee[['Country of Origin', 'Company','Number of Bags', 'Bag Weight']]
df2['Total_kg'] = coffee['Number of Bags'] * (coffee['Bag Weight'].str.replace('kg', ' ').astype('int64'))
df2
C:\Users\pirag\AppData\Local\Temp\ipykernel_8188\1904100434.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df2['Total_kg'] = coffee['Number of Bags'] * (coffee['Bag Weight'].str.replace('kg', ' ').astype('int64'))
Out[388]:
Country of Origin Company Number of Bags Bag Weight Total_kg
0 Colombia Coffee Quality Union 1 35 kg 35
1 Taiwan Taiwan Coffee Laboratory 1 80 kg 80
2 Laos Taiwan Coffee Laboratory 19 25 kg 475
3 Costa Rica Coffee Quality Union 1 22 kg 22
4 Colombia Coffee Quality Union 2 24 kg 48
... ... ... ... ... ...
202 Brazil Ipanema Coffees 2240 60 kg 134400
203 Nicaragua Exportadora Atlantic S.A 300 30 kg 9000
204 Laos Marubeni Corporation 343 60 kg 20580
205 El Salvador Aprentium Enterprises LLC 1 2 kg 2
206 Brazil Descafeinadores Mexicano SA. de CV 600 60 kg 36000

207 rows × 5 columns

In [391]:
coffee.groupby('Country of Origin')['Total_kg'].sum().sort_values(ascending=False)
Out[391]:
Country of Origin
Ethiopia                        12389860.0
Brazil                            320325.0
Guatemala                         245370.0
Honduras                          170490.0
Colombia                          161512.0
Tanzania, United Republic Of       94200.0
Nicaragua                          93630.0
Costa Rica                         71014.0
Peru                               56940.0
El Salvador                        55581.0
Uganda                             47400.0
Indonesia                          38460.0
Mexico                             30087.0
Laos                               22615.0
Kenya                              18065.0
Panama                             12000.0
Taiwan                              4317.0
Myanmar                             3000.0
Vietnam                             2491.0
United States (Hawaii)              1510.0
Thailand                             431.0
Madagascar                           300.0
Name: Total_kg, dtype: float64

Išvados¶

Pirminė hipotezė iš dalies pasitvirtino, nes didžiausius balaus (Total cup points) už visus kriterijus (Aroma, Flavor, Aftertaste, Acidity, Body, Balance, Uniformity, Clean Cup, Sweetness) gavo pupelės iš Brazilijos. Tačiau paaiškėjo, kad daugiausia kavos gamintojų buvo iš Taivano, o daugiausia kavos pupelių gamintojai rinkosi iš Etiopijos.